首页> 外文OA文献 >Efficient Exploration of Translation Variants in Large Multiparallel Corpora Using a Relational Database
【2h】

Efficient Exploration of Translation Variants in Large Multiparallel Corpora Using a Relational Database

机译:使用关系数据库对大型多平行语料库中的翻译变体进行有效探索

摘要

We present an approach for searching and exploring translation variants of multi-word units in large multiparallel corpora based on a relational database management system. Our web-based application Multilingwis, which allows for multilingual lookups of phrases and words in English, French, German, Italian and Spanish, is of interest to anybody who wants to quickly compare expressions across several languages, such as language learners without linguistic knowledge.\udIn this paper, we focus on the technical aspects of how to represent and efficiently retrieve all occurrences that match the user’s query in one of five languages simultaneously with their translations into the other four languages. In order to identify such translations in our corpus of 220 million tokens in total, we use statistical sentence and word alignment.\udBy using materialized views, composite indexes, and pre-planned search functions, our relational database management system handles large result sets with only moderate requirements to the underlying hardware. As our systematic evaluation on 200 search terms per language shows, we can achieve retrieval times below 1 second in 75 % of the cases for multi-word expressions.
机译:我们提出了一种基于关系数据库管理系统的搜索和探索大型多平行语料库中多词单元翻译变体的方法。我们的基于Web的应用程序Multilingwis允许对英语,法语,德语,意大利语和西班牙语的短语和单词进行多语种查询,这对任何想要快速比较几种语言表达的人(例如没有语言知识的语言学习者)都非常有用。 \ ud在本文中,我们集中于技术方面,即如何以五种语言中的一种来表示和有效地检索与用户查询匹配的所有匹配项,同时将其翻译成其他四种语言。为了在总共2.2亿个令牌的语料库中识别此类翻译,我们使用统计句子和单词对齐。\ ud通过使用物化视图,复合索引和预先计划的搜索功能,我们的关系数据库管理系统可以处理大量结果集,仅对底层硬件有中等要求。正如我们对每种语言200个搜索词的系统评估所显示的那样,对于多词表达式,在75%的情况下,我们可以将检索时间缩短到1秒以下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号